Empirical Observations of Probabilistic Heuristics for the Clustering Problem

نویسنده

  • Amin Vahdaty
چکیده

We empirically investigate a number of strategies for solving the clustering problem under the minimum variance error criterion. First, we compare the behavior of four algorithms, 1) randomized minimum spanning tree, 2) hierarchical grouping, 3) randomized maximum cut, and 4) standard k-means. We test these algorithms with a large corpus of both contrived and real-world data sets and nd that standard k-means performs best. We found, however, that standard k-means can, with non-negligible probability, do a poor job optimizing the minimum variance criterion. We therefore investigate various randomized k-means modiications. We empirically nd that by running randomized k-means only a modest number of times, the probability of a poor solution becomes negligible. Using a large number of CPU hours to experimentally derive the apparently optimal solutions, we also nd that randomized k-means has the best rate of convergence to this apparent optimum.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

زمانبندی دو معیاره در محیط جریان کاری ترکیبی با ماشینهای غیر یکسان

This study considers scheduling in Hybrid flow shop environment with unrelated parallel machines for minimizing mean of job's tardiness and mean of job's completion times. This problem does not study in the literature, so far. Flexible flow shop environment is applicable in various industries such as wire and spring manufacturing, electronic industries and production lines. After modeling the p...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

Ranking Pharmaceutics Industry Using SD-Heuristics Approach

In recent years stock exchange has become one of the most attractive and growing businesses in respect of investment and profitability. But applying a scientific approach in this field is really troublesome because of variety and complexity of decision making factors in the field. This paper tries to deliver a new solution for portfolio selection based on multi criteria decision making literatu...

متن کامل

Application of Probabilistic Clustering Algorithms to Determine Mineralization Areas in Regional-Scale Exploration Studies

In this work, we aim to identify the mineralization areas for the next exploration phases. Thus, the probabilistic clustering algorithms due to the use of appropriate measures, the possibility of working with datasets with missing values, and the lack of trapping in local optimal are used to determine the multi-element geochemical anomalies. Four probabilistic clustering algorithms, namely PHC,...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997